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We propose and develop a Lexicocalorimeter: an online, interactive instrument for measuring the 
“caloric content” of social media and other large-scale texts. We do so by constructing extensive yet 
improvable tables of food and activity related phrases, and respectively assigning them with sourced 
estimates of caloric intake and expenditure. We show that for Twitter, our naive measures of “caloric 
input”, “caloric output”, and the ratio of these measures are all strong correlates with health and 
well-being measures for the contiguous United States. Our caloric balance measure in many cases 
outperforms both its constituent quantities; is tunable to specific health and well-being measures 
such as diabetes rates; has the capability of providing a real-time signal reflecting a population’s 
health; and has the potential to be used alongside traditional survey data in the development of 
public policy and collective self-awareness. Because our Lexicocalorimeter is a linear superposition of 
principled phrase scores, we also show we can move beyond correlations to explore what people talk 
about in collective detail, and assist in the understanding and explanation of how population-scale 
conditions vary, a capacity unavailable to black-box type methods. 


I. INTRODUCTION 

Online instruments designed to measure social, psycho¬ 
logical, and physical well-being at a population level are 
becoming essential for public policy purposes and public 
health monitoring [HE]. These data-centric gauges both 
empower the general public with information to allow 
comparisons of communities at all scales, and natural¬ 
ly complement the broad, established set of more read¬ 
ily measurable socioeconomic indicators such as wage 
growth, crime rates, and housing prices. 

Overall well-being, or quality of life, depends on 
many factors and is complex to measure |3]. Existing 
techniques for estimating population well-being range 
from traditional surveys mm to estimates of smile-to- 
frown ratios captured automatically on camera in pub¬ 
lic spaces |S], and vary widely in the types of data they 
amass, collection methods, cost, time scales involved, and 
degree of intrusion. Partly in response to policy makers’ 
desire for simple “one number” quantification of complex 
systems—arguably a general human proclivity—many 
measures are composite in nature. Two examples are (1) 
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the Gallup Well-Being Index, which is based on factors 
such as life evaluation, emotional health, physical health, 
healthy behavior, work environment, and basic access to 
necessary resources |4]; and (2) the Living Conditions 
measure developed by the United States Census Bureau, 
which is derived from housing conditions, neighborhood 
conditions, basic needs met, a “full set” of appliances, 
and access to help if needed |6]. 

While such measures will always have their place, we 
venture that we must resist oversimplification. The dash¬ 
board of society should be just that—a rich set of incom¬ 
patible instruments whose informational content may be 
observed individually and in total, not unlike the required 
input needed for flying a plane where knowledge of just a 
single number representing “things are going well” would 
be untenable. The construction of data-centric instru¬ 
ments for social systems that deliver more direct, inter¬ 
pretable measures is therefore of great importance as we 
move forward into the age of ubiquitous (but not com¬ 
plete) measurement. 

With the explosive growth of online activity and social 
media around the world, the massive amount of real¬ 
time data created directly by populations of interest has 
become an increasingly attractive and fruitful source for 
analysis. Despite the limitation that social media users 
in the United States are not a random sample of the US 
population |H, there is a wealth of information in these 
data sets and uneven sampling can often be accommo¬ 
dated. 
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Indeed, online activity is now considered by many 
to be a promising data source for detecting health 
conditions and gathering public-health informa¬ 

tion nniE], and within the last decade, researchers have 
constructed a range of online public-health instruments 
with varying degrees of success. The maturing of these 
and related instruments along with theoretical models 
will ultimately fundamentally inform the limits of char¬ 
acterization and predictability of social systems. 

In the next two subsections, we cover related research 
and then describe our approach to measuring the “caloric 
content” of text. 


A. Previous work 

For a general overview of work relevant to our present 
effort, we briefly summarize related research concerning 
public health and well-being in connection with a range 
of social media and online data sets. 

In the difficult realm of predicting pandemics [T^ . 
Google Flu Trends [13] enjoyed early success and acclaim. 
Initially based very simply on search terms, the instru¬ 
ment proved unsurprisingly to be imperfect and in need 
of a more sophisticated approach m. 

In work by several of the current authors and col¬ 
leagues, Mitchell et al. measured the happiness of tweets 
across the US and found strong correlations with other 
indices of well-being at city and state level, such as the 
Gallup Well-being Index; the Peace Index; the America’s 
Health Ranking composite index of Behavior, Commu¬ 
nity and Environment, Policy and Clinical Care metrics; 
and gun violence (negative correlation) |15j . Using the 
same instrument in 10 languages, the Hedonometer, we 
have also shown that the emotional content of tweets 
tracks major world events mm- 

Paul and Dredze found that states with higher obe¬ 
sity rates have more tweets about obesity, and states 
with higher smoking rates have more tweets about can¬ 
cer m- They also found a negative correlation between 
exercise and frequency of tweeting about ailments, sug¬ 
gesting “Twitter users are less likely to become sick in 
states where people exercise.” They further found health 
care coverage rates to be negatively correlated with like¬ 
lihood of posting tweets about diseases. 

Chunara et al. recently found that activity-related 
interests on Facebook are negatively correlated with 
being overweight and obese, while interest in television 
is positively correlated with the same m 

In an analysis of online recipe queries. West et al. 
found that the number of patients admitted to the emer¬ 
gency room of a major urban hospital in Washington, 
DC for congestive heart failure (CHF) each month was 
significantly correlated with average sodium per recipe 
searched for on the Web in the same month [18| . 

Eichstaedt and colleagues [19] have demonstrated that 
psychological language on Twitter outperforms certain 
composite socioeconomic indices in predicting heart dis¬ 


ease at the county level. They were able to show in par¬ 
ticular that the expression of negative emotions such as 
anger on Twitter could be taken as a kind of risk factor 
at the population scale. 

On a US county level, Culotta [20] found that Twitter 
activity provided a more “fine-grained representation” 
of community health than demographics alone with the 
prevalance of particular words that indicate, for example, 
television habits, or negative engagement. 

Finally, in work directly related to our present study, 
Abbar et al. m have recently performed a similar analy¬ 
sis of translating food terms used on Twitter into calories. 
They found a correlation between Twitter calories and 
obesity and diabetes rates for the US, and explored how 
food-themed interactions over social networks vary with 
connectedness, finding suggestions of social contagion. 
While our approach and results are largely sympathet¬ 
ic, our work incorporates estimates of physical activity 
which we will show provides essential extra information 
regarding health; introduces a phrase extraction method 
we call serial partitioning; and leads to an online imple¬ 
mentation, paving the way for a real-time instrument as 
part of our proposed ‘panometer.’ We also note that we 
carried out our work concurrently and independently. 


B. Lexicocalometrics 

From the preceding list of studies, it has become clear 
that we can estimate population-scale levels of health and 
well-being through social media. Here, we examine the 
words and phrases people post publicly about food and 
physical activity on Twitter on a statewide level for the 
contiguous United States (48 states along with the Dis¬ 


and Methods and Materials, Sec. [IV| we group categori¬ 
cally similar words and phrases into lemmas, and we then 
assign caloric values to these lemmas using the terms and 
notation “caloric input” for food, C-m, and “caloric out¬ 
put” for activity, Cout- We define the ratio of caloric 
output to caloric input to be a third quantity, “caloric 
ratio”: 


trict of Columbia). As we explain fully below in Sec. HA 


C... = (1) 

^in 

While we will focus largely on the three quantities Cjn, 
Cout, and Crat, we will also explore “caloric difference”, 
an alternate combination of Cin and Cout involving a sin¬ 
gle parameter: 


Cdiff(a) = aCout - (1 - a)Cin, (2) 

where 0 < a < 1. We use “phrase shifts” [5] to show 
how specific lemmas—e.g., “apples”, “cake with frost¬ 
ing”, “white water rafting”, “knitting”, and “watching 
tv or movie” contribute to the caloric texture of states 
across the contiguous US. We then correlate all three val¬ 
ues with 37 measures relating to health and well-being. 
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and we find statistically strong correlations with quan¬ 
tities such as high blood pressure, inactivity, diabetes 
levels, and obesity rates. For ease of language, we will 
generally speak of phrases rather than lemmas. 

We have also generated an accompanying online, inter¬ 
active instrument for exploring health patterns through 
the lens of “Twitter calories”: the Lexicocalorimeter. An 
initial, fixed version of the instrument may be accessed at 
this paper’s Online Appendices, http://compstorylab. 
org/share/papers/alajajicin2015a/, with a evolv- 
able, production version housed within our larger mea¬ 
surement platform http://paiiometer.org at http:// 
panometer.org/instruments/lexicocalorimeter (all 
code for these sites can be found at https: //github. 


com/cindyreagan/lexicocalorimeter-appendix). We 
note that while our online instrument is based on Twit¬ 
ter, it may in principle be used on any sufficiently large 
text source, social media or otherwise, such as Facebook. 

From this point, we structure the core of our paper as 
follows. In Sec. [ij we establish and discuss our findings 
in depth. Specifically, we: (1) Outline o ur te xt analysis 
of a Twitter corpus from 2011-2012 Sec. II A), reserving 
full details for Methods and Materials in Sec. |IV[ (2) 


Present caloric maps of the contiguous US contrasting 
the 48 stat es an d DC through histograms and phrase 
shifts (Sec. IIB); and (3) Examine how Cin, Cjn, C^at, 
and C'diff(a) correlate with a suite of measures relating 
to health and well-being. In the Supporting Information, 
we provide a sample of confirmatory figures as well as all 
shareable data sets (e.g., IDs for all tweets). We offer 


concluding thoughts in Sec. Ill 


II. ANALYSIS AND RESULTS 


A. Estimating calories from phrases 


We used all available geotagged tweets from 2011 and 
2012 (around 50 million) from a bounding box of the con¬ 
tiguous US, using Twitter’s garden hose sample (which 
is a sample of approximately 10% of all tweets, including 
those that are not geotagged) and the geotag feature to 
determine from which of the 48 continental states and 
the District of Columbia each tweet came. From this 
sample, we counted the total number of times each food 
and physical activity phrase in our database was tweeted 
about in each of the 48 continental states and the District 
of Columbia (see Sec. IV and Dataset SI at https:// 
dx. doi . org/10.6084/m9 . figshare . 4530965. vl for all 
tweet IDs). We then used these counts to determine the 
average caloric input Cin from food phrase tweets and the 
average caloric output Cout from physical activity phrase 
tweets as follows. 


First, we equate each food phrase s with the calories 
per 100 grams of that food, using the notation <7111(5). 
(We also explored serving sizes but the databases avail¬ 
able proved far from complete.) We then compute the 


caloric input for a given text T as: 


CUT) = 


E«es.„C'in(s)/(s|7’) 

Y.sn^\T) 


^ CUs)p{s\T), 

SGSin 


( 3 ) 

where /(s|T) is the frequency of phrase s in text T, 
p{s\T) is the normalized version, and Sin is the set of 
all food phrases in our database. 

Second, for each tweeted physical activity phrase, we 
use an estimate of the Metabolic Equivalent of Tasks, or 
METs, which we then converted to calories expended per 
hour, assuming a weight of 80.7 kilograms, the average 
weight of a North American adult [^. Analogous to 
C'in(T) above, we then have 


CoutiT)= Y, Cnntis)pis\T), (4) 

sGSout 


where now ^out is the set of all phrases in our activity 
database. 

We emphasize that both our food and exercise phrase 
data sets and Twitter databases are necessarily incom¬ 
plete in nature. The values of Cin and Cout are thus not 
meaningful as absolute numbers but rather have power 
for comparisons. We also acknowledge that our equiva¬ 
lences are crude—e.g., each mention of a specific food is 
naively turned into the calories associated with 100 grams 
of that food—and later on we address our choices in more 
depth. Nevertheless, our method is pragmatic yet—as we 
will show—effective, and offers clear directions for future 
improvement. 

For simplicity and ultimately because the results are 
sufficiently strong, we did not filter tweets beyond their 
geographic location. Tweets may thus come from indi¬ 
viduals, restaurants, sports stores, resorts, news outlets, 
marketers, fitness apps, tourists, and so on, and fur¬ 
ther improvements and refinements may be achieved by 
appropriately constraining the Twitter corpus. 

Finally, we take the ratio of C'out(T) to C'in(T) to 
obtain the text’s caloric ratio C'rat(T). In general, we 
observe that a higher value of C'i.at(T) at the population 
scale would appear to be intuitively better, up to some 
limit indicating negative energy balance. We note that 
Crat = 1 is not salient and should not be taken to mean 
a population is ‘balanced calorically’. As we discuss lat¬ 
er, using the difference, what we call Caloric Difference, 
a generalization of Cout — Cn, generates similar results 
but, from a framing perspective, we have reservations in 
creating a scale with a 0 point given the approximate 
nature of our measures. 


B. Caloric maps of the contiguous US 

We now move to our central analysis and exploration of 
how our lexicocalorimetric measure varies geographically. 
We start with visual representations and then continue 
on to more detailed comparisons. 
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FIG. 1. Choropleth maps indicating (A) caloric input Cin and (B) caloric output Cout in the contiguous United States 
(including the District of Columbia) based on 50 million geotagged tweets taken from 2011-2012. For both maps, darker means 
higher values as per the color bars on the right. The histograms in Figs. and |S3| show the specific rankings according 

to these two variables and also Crat (see Fig. |^. The overlaid phrase lemmas are the most dominant contributors to Gin and 
Gout—almost universally “pizza” and “watching tv or movie”. 

In Fig. we show two choropleth maps of our overall green. 

2011-2012 measures of Twitter’s caloric input Cin and 

caloric output Cout- For both maps and those that fol- These maps immediately allow for some basic obser- 
low, quantities increase as colors move from light to dark vations which we will delve into and harden up as our 

analysis proceeds. For the food calories map, we see Cm 
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FIG. 2. The same choropleth maps for Cin and Cout presented Fig. but now with phrases whose increased usage contribute 
the most to a population’s Cin and Cout differing from the overall averages of these measures. See Sec. [ng For example, tweets 
from Vermont, which was above average for both Cin and Cout for 2011-2012, disproportionately contain “bacon” and “skiing”. 
Michigan was above average for Cin and below for Cout in 2011-2012, and the most distinguishing phrases are “chocolate 
candy” and “laying down”. See Figs. [5][S^ and|S3|for ordered rankings. 


is generally largest in the Midwest and the south while 
Colorado and Maine stand out as states with the lowest 
calories. 

We see a different texture in the activity calories map 


with the highest caloric output according to our measure 
appearing in the three-state block of Wyoming, Colorado, 
and Utah, as well as Vermont. Tweet-based caloric out¬ 
put drops to a low in Mississippi and the surrounding 
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states, while Michigan also appears to have a low value 

of Cout- 

For the food and activities maps in Fig. [T] we also 
show the most dominant phrase for each population’s Cin 
and Cout scores. Almost uniformly, “pizza” (high calorie 
food) and “watching tv or movie” (low calorie activity) 
are the lemmas with the largest contributions, a func¬ 
tion of both volume and caloric scores. Only Mississippi 
(“ice cream”) and Wyoming (“cookies”) are exceptions, 
though “pizza” is still near the top for both. 

In Fig. we present the same choropleth maps 
from Fig. but now with the phrase most distinguish¬ 
ing a population. Specifically, we show phrases whose 
increased prevalence most contributes to moving a popu¬ 
lation’s Twitter calorie scores away from the overall aver¬ 
age for the contiguous US. For example, if a population’s 
Cin is above average, we find the food phrase whose fre¬ 
quency coupled with its caloric content most strongly 
moves the population’s Qn up from the average. (We 
explain in full how we determine these phrases later with 
phrase shifts in Sec. IID ) We now see a diverse spread of 
terms. We find a number of phrases make for reasonable 
representations: 


• “lobster” in Maine and Massachusetts; 

• “grits” in Georgia; 

• “skiing” in Vermont, New Hampshire, and Utah; 

• and “running” in Colorado and a number of other 
locations. 


Prototypical unhealthy foods rise to the top in various 
states: 


• “donuts” in Texas; 

• “cake” in Mississippi; 

• “chocolate candy” in Louisiana; 

• and “cookies” in Indiana. 



FIG. 3. Choropleth for caloric ratio Grat = Gout/Gin. See 
Figs. [S] [S^ and|S3|for ordered rankings. 


Now, we do not pretend that these phrases all come 
from individuals diligently recording their present meals 
or activities. Apart from tweets from individuals, our 
database contains tweets from companies, advertisers, 
resorts, and so on. And some phrases are problemat¬ 
ic in their generality of meaning, most especially “run¬ 
ning” (the word “run” currently has the most meanings 
in the Oxford English Dictionary). Nevertheless, as we 
dig deeper into all the phrases found for a particular 
state, we will continue to find commonsensical lexical 
patterns. 

In Fig. we show a choropleth map for caloric ratio, 
Crat- We see that the highest values of Crat are found in 
Colorado, Wyoming, and Vermont, and secondarily for 
Maine, Minnesota, Oregon, and Utah. Low values of Crat 
appear in the region comprising Mississippi, Louisiana, 
Alabama, and Arkansas, as well as West Virginia. 

An initial visual comparison of of Figs, [^and 0 sug¬ 
gest that Cout is more well aligned with Crat than Cin. 
The reason is that for the present version of the Lexic- 
ocalorimeter, Cout has a larger dynamic range than Cin, 
roughly 250 to 285 versus 160 to 210 giving ratios of 
llg ~ 1.31 and ||| ~ 1.14. We could assert that Cin is 
fundamentally less informative but: 


By contrast, a few “virtuous” foodstuffs appear such as 
“green beans” in Oregon and “tomato” in California. 

Our activity list also includes some rather low intensity 
ones and we see: 

• “eating” rising to the top in Texas, the south, and 
a number other states; 

• “watching tv or movie” in Pennsylvania and else¬ 
where; 

• “sitting” in Tennessee; 

• “talking on the phone” in Delaware; 

• “getting my nails done” in New Jersey; 

• and simply “lying down” in Michigan. 


1. In Sec. |II E1 we will find that some measures relat¬ 
ing to health and well-being correlate more strongly 
with Cin and some with Cout; 


2. We may adjust the dynamic range of either measure 
by rescaling, introducing a kind of tunability [5] to 
the instrument (a feature we will reserve for future 
iterations); and 

3. Because our food phrase database is a factor of 10 
smaller than our activity phrase one, revisions of 
our instrument may elevate the power of Cin. 


To provide some support for po int 1, we compare Cout 
and Cin in Fig. (see also Fig. SI I. Importantly, we 
see that the two measures are indeed not well correlat¬ 
ed, indicating they contain different kinds of informa¬ 
tion (Pearson correlation coefficient Pp ~ 0.13, p-value 


Deviation from national avgs. caloric Ratio 
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all bars are relative to the overall average of the specific 
measure. Numeric rankings for each measure are given 
next to each bar. In Figs. [S2| and [S3l we present the same 
histograms re-sorted respectively by Qn and Cout- 

As was indicated by our inspection the choropleth 
maps, we do indeed see that Crat is more strongly driv¬ 
en by Cout than C-m due to the former’s larger dynamic 
range. The states with the highest values of Cj-at achieve 
their scores through high levels of Cout but more vari¬ 
able levels of Cin- Wyoming (23), Vermont (21), and 
Utah (25) are all middling in Cin while Colorado (48) 
and Maine (49) have the lowest ranks for caloric intake. 
At the trailing end, we see by contrast that low activity 
ranks are coupled with high ranks for caloric intake. 

A few of the more anomalous states are both evident 
in the Cin and Cout histograms and as those appearing 
farthest away from the best line of fit in the scatter plot 
of Fig. 1^ South Dakota has both high values of Cin and 
Cout (ranks of 1 and 7) that arrange to give it a ranking 
of 25 for Crat- Maryland ranking 42nd and 45th in Cn 
and Cout, is the only state in the ‘bottom’ 10 of both 
measures. 


FIG. 4. Plots for the contiguous US showing the lack 
of correlation between caloric input Cn and caloric output 
Cout, demonstrating their separate value as they bear differ¬ 
ent kinds of information. The Pearson correlation coefficient 
Pp is -0.13 and the best line of fit slope is m = -1.64. Fig. |S1| 
adds plots of Crat as a function of Cn and Cout. 


= 0.39). This demonstrates why we might expect Cin 
or Cout to separately correlate more strongly with other 
population-level measures, and justifies forming a dash¬ 
board using both Cn and Cout as well the composite 
measure of Crat- 


Regarding point 2 above, we have evidently made a 
number of choices in computing Cin and Cout that mean 
we have already introduced an arbitrary tuning of the 
ratio Crat (e-g-, assuming 100 grams of a food and an 
hour’s worth of activity). Having no principled way of 
rescaling (i.e., one that is not a function of the data set 
being studied), we have chosen to leave the measures as 
computed. As we discuss later, in future iterations we 
envisage for the Caloric Difference version that introduc¬ 
ing tunability of the dynamic ranges of Cjn and Cout— 
altering the bias of the measure toward food or activity— 
will allow the Lexicocalorimeter to be refined for a range 
of purposes such as estimating c orrel ates of diabetes lev¬ 
els versus cancer rates (see Sec. HE). 


C. Rankings for the contiguous US 

Having taken in the maps of our three measures Cn, 
Cout, and Crat, we now explore the rankings quantita¬ 
tively, first through the histograms shown in Fig. We 
order the 48 states and DC by Crat (rightmost plot) and 


D. Phrase shifts 


In our work on measuring happiness, we have devel¬ 
oped and extensively used “word shifts” to show which 
words make a given text appear more positive than 
another text in aggregate (see [2] and [Ej). Such visu¬ 
alizations not only provide our necessary test, but also 
allow us to draw insight from the lexical tapestry of 
texts. Here, we will explain and use analogously con¬ 
structed phrase shifts for both Cn and Cout to examine 
the states at the extremes of our Crat rankings, Col¬ 
orado and Mississippi. Interactive food and activity 
phrase shifts for the 49 regions of the contiguous US form 
a central part of our online Lexicocalorimeter: http: 
//panometer.org/instruments/lexicocalorimeter 
We start with two texts: a base “reference text” Tref, 
and a “comparison text” Tcomp which we wish to compare 
to Tref. In this paper, we will use the Contiguous US as 
the reference text (weighting the phrase distributions of 
each state equally), but in principle any text can be used 
(e.g., in comparing two states, one would be selected as 
a reference). Our interest is in determining which words 
or phrases most contribute to or go against the difference 
in estimated calories. C/o(Tcomp) — C/o(Tref) where i/o 
stands for in or out. Following [2] and using Eq. ([^, we 
can express the difference as 


C'i/o(Tcomp) — Ci/o(Tref) 

= X! C'i/o(s) p(,s|Tcomp) -p(.s|Tref) 


seSi, 




s^Si/ 


( 5 ) 
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Wyoming, 2 
Vermont, 3 
Utah, 4 
Maine, 5 
Minnesota, 6 
Oregon, 7 
New Hampshire, 8 
Montana, 9 
New York, 10 
Washington, 11 
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Wisconsin, 13 
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Nebraska, 15 1 1 
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Rhode Isiand, 23 ED 
Missouri, 24 □ 
South Dakota, 25 D 
Okiahoma, 26 | 
North Dakota, 27 I 
Iliinols, 28 I 
Virginia, 29 | 



. Kansas, 30 

□ Connecticut, 31 

□ Pennsylvania, 32 

□ Tennessee, 33 

□ Indiana, 34 

ED New Jersey, 35 
I I Texas, 36 
I I South Carolina, 37 
I I North Carolina, 38 
I—I Maryland, 39 
I I Georgia, 40 

- Ohio, 41 

Michigan, 42 
, Kentucky, 43 
] Delaware, 44 
] Arkansas, 45 
] West Virginia, 46 
] Alabama, 47 
] Louisiana, 48 
] Mississippi, 49 


30 


- 0.12 


I 

Ratio 


I 

0.13 


FIG. 5. Histograms of caloric intake Cin (food), caloric output Cout (activity), and caloric ratio Crat for the states of the 
contiguous US, all ranked by decreasing Crat. Bars indicate the difference in the three quantities from the overall average with 
colors corresponding to those used in Figs. mi and We provide the same set of histograms re-sorted by Cin and Cout in 
Figs. and 


We now have a sum contributions due to all phrases. We 
normalize these contributions as percentages and anno¬ 
tate their structure as follows: 


ACi/o(s) = 

100 


^(comp) ^(ref) 

'-’i/o '^i/o 


Cuo{s)-C[^/f 


(comp) _ (ref) 
Fs Fs 


+/- 


—«v-— 

T/4 


( 6 ) 


where J2seS-/ = ±100. We use the symbols 

+/— and t / Y to respectively encode whether the calo¬ 
ries of a phrase exceed the average of the reference text, 
and whether a phrase is being used more or less in the 
comparison text. We call SCi/o{s) the “per food/activity 
phrase caloric expenditure shift”. Finally, we sort phras¬ 
es by the absolute value of (5(71/0(3) to create each phrase 
shift. 


In Fig. we present food phrase shifts which help to 
illustrate why: 

• Colorado ranks 48/49 for caloric input (7in 
(Fig.§\), 

• Mississippi ranks 12/49 for caloric input Cin 
(Fig. §3), 

• Colorado ranks 2/49 for caloric output (7out 

(Fig.[^), 

• and Mississippi ranks 49/49 for caloric output Cout 
(Fig.[^). 

These shifts display phrases that fall into four cate¬ 
gories: 

±t, yellow: Phrases representing above average quan¬ 
tities (here calories) being used more 
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A. Colorado—food: 


s 

Oj 

O 

o 


Z+i 


I+T 


I-T[ 


|S-i 


1. noodles-fC 


2. chocolate candy+j □ 


4. cake+J,D 
5. cookies+J,n 


□ 3. bacon+l 


8. pasta-fD 


□ 6 . chicken-1 
07. olive oil+l 


10. apples-1D 
11. cucumber-|D 
12. egg-tD 


Do. shrimp-1 


14. tomato-fD 


Di 3. crab-J. 


16. peaches-I'D 
17. turkey-tO 


Di 5. ice cream-1 


Di 8. pineapple-J, 


19. onion-'I'll 
20. cabbage-jD 
21. pear-lD 
22. donuts-hiD 


D 23 . almonds+t 


-5 0 5 

Per food phrase caloric shift 


C. Colorado—activity: 


s 

oj 




Z+i I I 
E-t O I-i 


]z+t 

ll 


Zll. running+t 


□ 2 . skiing+l 

□ 3. hiking-Ht 

□ 4 . sno wboarding+f 
D5. biking+l 

De. eating-j 

D7. mountain biking+t 

Ds. laying down-]. 

D 9 . white water rafting+t 
Dio. rock climbing+l 
Dll. watching tv/movies-]. 

D12. talking on phone-]. 

Di 3. sledding+l 
Di 4. ice skating+f 
15. reading-I'D 

Dig. playing video games-], 

1 17 . walking+l 

118. showering-]. 

[ 19 . jazzercise+l 

1 20 . using treadmill+l 

1 2 1. scuba diving+l 

1 22 . getting my hair done-], 

_l2i bowling+t _ 

-5 0 5 

Per activity phrase caloric expenditure shift 


B. Mississippi—food: 


cS 

!-l 

O 

O 


Z+i I 


z-t 


3. shrimp-'I'C 


4. pineapple-'I'C 


7. catfish-fC 
8. mashed potatoes-|C 
9. grits-tC 
10. chicken-1C 


13. olive oil+|C 
14. peaches-tlZ 
15. bacon+J,C 


IZ+T 


Z-i 


Zll. cake+'f 
II 2 . cookies+t 


Zs. pasta-J. 

II6. banana-J, 


Jll. sausage-ft 

D12. crab-J, 


17. cabha.ge-'I' l I 


16. apples-1 


19. sweet potato-"f □ 


n 18. mango-]. 


22. banana pudding-1 □ 


I I 20 . onion-J, 

I I 2 I. mayonnaise-t't' 


23. 


zli 


-1 0 1 

Per food phrase caloric shift 


D. Mississippi—activity: 


Z+i 


cS 





1. runnin g+iC 


2. dancing-f-jim 
3. eating- tD 


6. laying down- tD 
7 . walking-tlD 
8. biking+J,D 
9. ice skating-f J.D 
10. using treadmill-tiD 
11. swimming+J,D 
12. hiking-tiD 

13. attending church-f II 

14. talking on phone-f I 

15. sitting-tl 
16. getting my hair done-ill 
17. boxing-f|l 
18. bowling+|l 


D4. cooking-bl 
□ 5 . watching tv/movies-| 


20. golfing+|l 
21. sledding+J,l 


I 19 . playing football+| 


122. getting my nails done-| 

123. cleaning-Ht _ 


-4-2 0 2 4 

Per activity phrase caloric expenditure shift 


FIG. 6. Phrase shifts showing which food phrases and physical activity phrases have the most influence on Colorado and 
Mississippi’s top and bottom ranking for caloric ratio, when compared with the average for the contiguous United States. 
Note that phrases are lemmas representing phrase categories. Overall, Colorado scores lower on Twitter food calories (257.4 
versus 271.7) and higher on physical activity calories (203.5 versus 161.3) than Mississippi. We provide interactive phrase 
shifts as part of the paper’s Online Appendices at http://compstorylab.org/share/papers/alajajian2015a/ and at http: 
//panometer.org/instruments/lexicocalorimeter We explain phrase (word) shifts in the main text (see Eqs.[^and[^, and 
in full depth in [2] and m and online at http://hedonometer.org | 23 |. 
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often. Examples: “cookies” for Mississip¬ 
pi in Fig. and “rock climbing” for Col¬ 
orado in Fig. [^. 

-I, pale blue: Phrases representing below average quan¬ 
tities being used less often. Examples: 
“watching tv or movie” for Mississippi in 
Fig. 1^ and “laying down” for Colorado 
in Fig. 1^. 

-t-J,, pale yellow: Phrases representing above average quan¬ 
tities being used less often. Examples: 
“chocolate candy” for Colorado in FigJ^ 
and “running” for Mississippi in Fig. |^. 

-f, blue: Phrases representing below average quan¬ 
tities being used more often. Examples: 
“reading” for Colorado in Fi g. [6| 4 and 
“catfish” for Mississippi in Fig. [6pT 

Note that depending on the quantity, higher or lower may 
be “better” and the four categories flip signs in their sup¬ 
port. For example, Cin and Cout increase with -l-f phras¬ 
es; after we examine correlations with health and well¬ 
being measures in Sec. |II E[ we will be able to interpret 
this as “bad” for Cin and “good” for Cout- 

At the top of each phrase shift, the bars indicate the 
total contribution of each of the four types of phrases, 
and the black bar the net change. We see that the four 
net changes arise in different ways. 

• Fig. |§4: Colorado is lower than average for Cn 
largely due to tweeting more about relatively low 
calorie (per 100 grams) foods: “noodles”, “egg”, 
“pasta”, and “turkey”. We also find less tweets 
about high calorie foods such as “candy”, “cake”, 
and “cookies.” Going against these phrases, we see 
Colorado does tweet relatively more about “bacon” 
and “olive oil”, and less about some relatively low¬ 
er calorie foods “chicken”, “ice cream”, “shrimp”, 
and “corn”. We note that this does not mean these 
foods are low calorie in absolute terms (“ice cream” 
is a good example), just that 100 grams of them are 
low calorie in comparison to the US baseline. 

• Fig. 1^: Mississippi almost equally tweets less 
about a variety of low calorie foods, e.g., “pas¬ 
ta”, “banana”, and “crab” (pale blue bar) while 
also tweeting more about the complementary range 
of such foods including “shrimp”, “peaches”, and 
“pineapple” (dark blue bar). The modest net gain 
is mostly due to a small increase in tweeting about 
high calorie foods such as “cake”, “cookies”, and 
“sausage”. 

. Fig. [6p: For physical activity, tweets from Col¬ 
orado show a preponderance of relatively high 
caloric expenditure phrases (-t-f, yellow) includ¬ 
ing “running”, “skiing”, “hiking”, “snowboard¬ 
ing” and so on. Tweeting less about low effort 
activities is the only other contribution of any 


substance—Colorado tweets less about “eating”, 
“laying down”, and “watching tv or movie”. 

• Fig. [6p: Mississippi’s low ranking in activity is 
largely due to tweeting less about high output 
activities (-l-j,, pale yellow): less “running”, “danc¬ 
ing”, “walking”, and “biking”. The second most 
important category is an increase in low out¬ 
put activity phrases such as “eating”, “attending 
church”, and “talking on the phone.” 

In Figs. an d we complement the 

four phrase shifts of Fig. by showing the top 23 
phrases for each of four ways phrases may contribute. 
Interactive phrase shifts for all of the contiguous US 
are housed at http://panometer.org/instruments/ 
lexicocalorimeter 

Overall, we find the lexical texture afforded by our 
phrase shifts is generally convincing, but we expect future 
improvements in our food and activity data sets will 
iron out some oddities (we again use the example of ice 
cream). We also note that phrase shifts are very sen¬ 
sitive and that terms that seem to be being evaluated 
incorrectly may easily be removed from the phrase set, 
and that doing so will minimally change the overall score 
for sufficiently large texts. 

E. Correlations with other health and well-being 
measures 

We now turn to a suite of statistical comparisons 
between our three measures—caloric input, caloric out¬ 
put, and caloric ratio—and a collection of demographic, 
behavioral, health, and psychological quantities. 

We use Spearman’s correlation coefficient to exam¬ 
ine relationships between Cin, Cout, and Crat and 37 
variables variously relating to food and physical activ¬ 
ity, “Big Five” personality traits, and health and well¬ 
being rankings (a total of 111 comparisons) [HHJ[53HS3] ■ 
To correct for multiple comparisons, we calculate the q- 
value for each correlation coefficient using the Benjamini- 
Hochberg step-up procedure [31] (the g-value is to be 
interpreted in the same way as a p-value). We then con¬ 
sider correlations in reference to the standard significance 
levels of 0.01 and 0.05. 

We must first acknowledge that many of the variables 
we test against our measures are highly correlated with 
each other. The food and physical activity-related vari¬ 
ables are in the areas of physical activity levels, produce 
intake and availability rates (including trends in public 
schools), chronic disease rates, and rates of unhealthy 
habits. Many of these variables are well known to be 
influenced by diet and physical activity (e.g., obesity 
rates I2SI), and others may be less directly related (e.g., 
percent of cropland in each state harvested for fruits and 
vegetables [35]) ■ 

To give some grounding for the full set of compar¬ 
isons, we show in Fig. [^how six demographic quantities 
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FIG. 7. Six demographic quantities compared with caloric ratio Crat for the contiguous US. The inset values are the Spearman 
correlation coefficient ps, and the Benjamini-Hochberg c/-value. See Tab. |T] for a full summary of the 37 demographic quantities 
studied here. 


vary with caloric ratio Crat- We see strong correlations 
with |/5s| > 0.68, and the highest value for Benjamini- 
Hochberg g-value is 5.8x10“^. 

We present a summary of all results in Tab. |T] where 
we have ordered and numbered demographic quantities 
in terms of ascending Benjamini-Hochberg g-values for 
Crat- For comparison and to further demonstrate the 
robustness of our approach, in Tabs. ^ S2 and S3), we 
reproduce the same analysis with the inclusion of liquids 
and for a differential measure C'diff(a) = aCout ~ (1 ~ 
ajCin, both with and without liquids. Here, we choose 
to set the effective means of Cout and Cin equal across 
the statewide averages (i.e., Q!(Cout) = (1 — a)(C'in)), 
resulting in a = 0.598. Overall, we find little variation 
in our results whether we use C^at and Cdiff(0.598). 

Surveying the health-based demographics, we found 
Crat was significantly correlated with all chronic disease- 
related rates we tested against (high blood pressure (#3), 
adult diabetes (#4), adult overweight and obesity (#6), 
heart disease deaths (#7), adult obesity (#8), childhood 
overweight and obesity (#13), high cholesterol (#19), 
and colorectal cancer (#22)). All of these but colorectal 
cancer rate were also significantly correlated with Cout- 

Caloric input Cin results were more mixed. Chron¬ 


ic disease-related rates were also significantly correlated 
with Cin, with the exception of adult diabetes, childhood 
overweight and obesity, and high cholesterol, after cor¬ 
recting for multiple comparisons. 

The variables relating to unhealthy habits (smoking 
(#16) and binge drinking rates (#26)) both correlated 
significantly with all three of our measures with the one 
exception of binge drinking and caloric input. The direc¬ 
tion of correlations for these two habits are opposite each 
other (e.g., negative for smoking and Crat, positive for 
binge drinking and Crat), consistent with recent work on 
alcohol consumption [55] . 

The two variables relating to physical activity rates 
(percent of population that has had no physical activity 
in past 30 days (#1), and percent of population that has 
been physically active in past 30 days (#2)) correlated 
significantly with all three of our measures. The two 
measures relating to rates of physical and mental health 
(average number of poor mental health days in past 30 
days (#24), and average number of poor physical health 
days in past 30 days (#27)) correlated significantly with 
both Cout and Crat, but did not correlate significantly 
with Cn. 

The four variables relating to fruit and vegetable con- 
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Health and/or well-being quantity 

ps for 

Crat 

g-val 

Ps for 
Cin 

g-val 

ps for 

Gout 

g-val 

1. % no physical activity in past 30 days |24| 

-0.78 

2.73 X 10“°® 

0.58 

5.67 X lO-o® 

-0.66 

1.51 X lO-o® 

2. % have been physically active in past 30 days [24| 

0.78 

2.73 X 10“°® 

-0.57 

6.53 X lO-o® 

0.67 

1.24 X lO-oo 

3. % high blood pressure |24| 

-0.77 

2.73 X 10“°® 

0.32 

4.05 X lO-o® 

-0.78 

2.73 X lO-o® 

4. Adult diabetes rate |25| 

-0.76 

5.44 X lO-o® 

0.29 

6.09 X 10-02 

-0.77 

2.73 X lO-o® 

5. CNBC quality of life ranking |26| 

-0.76 

6.75 X 10“®® 

0.28 

7.34 X lO-o® 

-0.77 

3.60 X lO-o® 

6 . % adult overweight/obesity |27| 

-0.73 

3.16 X 10“°® 

0.55 

1.41 X 10-04 

-0.59 

3.07 X lO-o® 

7. Heart disease death rate |27| 

-0.73 

2.50 X 10“°® 

0.34 

2.80 X lO-o® 

-0.73 

2.30 X lO-o® 

8 . % adult obesity |25| 

-0.72 

4.30 X 10“°® 

0.53 

2.26 X 10-04 

-0.59 

2.96 X lO-o® 

9. Gallup Wellbeing score [4] 

0.72 

4.69 X lO-o® 

-0.31 

4.43 X 10-02 

0.73 

3.99 X 10-08 

10. America’s Health Rankings, overall [24| 

-0.72 

4.10 X 10“°'^ 

0.43 

4.74 X lO-o® 

-0.67 

2.77 X lO-o® 

11. Life expectancy at birth |27| 

0.68 

5.81 X 10“°'^ 

-0.4 

6.91 X lO-o® 

0.65 

2.64 X lO-o® 

12. % who eat fruit less than once a day [28| 

-0.67 

1.20 X 10“°® 

0.61 

1.39 X lO-o® 

-0.51 

5.35 X 10-04 

13. % child overweight/obesity [27] 

-0.64 

3.53 X lO-o® 

0.27 

7.55 X 10-02 

-0.64 

3.20 X lO-o® 

14. % who eat vegetables less than once a day |28| 

-0.61 

1.39 X lO-o® 

0.51 

5.33 X 10-04 

-0.46 

1.57 X lO-o® 

15. Median daily intake of fruits [28| 

0.6 

1.98 X 10“°® 

-0.62 

8.33 X lO-o® 

0.41 

5.37 X lO-o® 

16. Smoking rate |27| 

-0.59 

2.96 X 10“°® 

0.51 

5.26 X 10-04 

-0.48 

1.08 X lO-o® 

17. Median household income [27] 

0.51 

5.55 X 10“°"^ 

-0.53 

3.27 X 10-04 

0.4 

8.38 X lO-o® 

18. Median daily intake of vegetables |28| 

0.5 

6.10 X 10“°"^ 

-0.56 

7.44 X lO-o® 

0.31 

4.36 X lO-o® 

19. % high cholesterol 

-0.49 

8.11 X lO-O"' 

0.23 

1.45 X 10-01 

-0.48 

9.05 X 10-04 

20. Brain health ranking [29| (lower is better) 

-0.49 

8.11 X 10“°"^ 

0.62 

1.39 X lO-o® 

-0.29 

5.70 X lO-o® 

21. % with bachelor’s degree or higher [6] 

0.46 

1.57 X 10“°® 

-0.54 

1.66 X 10-04 

0.33 

2.82 X lO-o® 

22. Colorectal cancer rate [25| 

-0.44 

4.09 X 10“°® 

0.53 

3.59 X 10-04 

-0.27 

8.25 X lO-o® 

23. US Census Gini index score |3Q| (lower is better) 

-0.42 

5.37 X 10-°® 

-0.03 

8.42 X 10-01 

-0.5 

5.55 X 10-04 

24. Avg # poor mental health days, past 30 days |24| 

-0.42 

5.37 X lO-o® 

0.12 

4.80 X 10-01 

-0.48 

1.06 X lO-o® 

25. Neuroticism Big Five personality trait [31| 

-0.38 

1.09 X 10“°® 

0.2 

2.03 X 10-01 

-0.37 

1.44 X lO-o® 

26. Binge drinking rate |24| 

0.37 

1.46 X 10“°® 

-0.15 

3.56 X 10-01 

0.41 

5.84 X lO-o® 

27. Avg # poor physical health days, past 30 days |24| 

-0.35 

2.34 X lO-o® 

0.19 

2.19 X 10-01 

-0.38 

1.13 X 10-02 

28. Farmers markets per 100,000 in pop. [2^ 

0.34 

2.72 X 10“°® 

0.06 

7.17 X 10-01 

0.42 

5.14 X lO-o® 

29. Strolling of the Heifers locavore score (lower is better) |32| 

-0.29 

5.86 X lO-o® 

-0.3 

5.41 X 10-02 

-0.45 

2.94 X lO-o® 

30. Extraversion Big Five personality trait [31| 

-0.28 

6.94 X 10“°® 

0.03 

8.42 X 10-01 

-0.29 

5.63 X lO-o® 

31. % schools offering fruit/veg at celebrations |28| 

0.24 

1.31 X 10“°® 

-0.46 

1.96 X lO-o® 

0.05 

7.90 X 10-01 

32. Openness Big Five personality trait |31| 

0.23 

1.31 X 10-01 

-0.5 

6.11 X 10-04 

0.04 

8.10 X 10-01 

33. % cropland harvested for fruits/veg |28| 

0.19 

2.34 X 10-01 

-0.62 

1.37 X lO-o® 

-0.04 

8.10 X 10-01 

34. Conscientiousness Big Five personality trait |31| 

-0.12 

4.81 X 10-01 

0.2 

2.10 X 10-01 

-0.05 

7.93 X 10-01 

35. % census tracts, healthy food retailer within 1/2 mile [28| 

-0.03 

8.44 X 10-01 

-0.52 

3.68 X 10-04 

-0.24 

1.31 X 10-01 

36. George Mason overall freedom ranking |33| (lower is freer) 

-0.03 

8.42 X 10-01 

-0.11 

5.15 X 10-01 

-0.1 

5.64 X 10-01 

37. Agreeableness Big Five personality trait |31| 

-0.01 

9.61 X 10-01 

0.22 

1.50 X 10-01 

0.08 

6.47 X 10-01 


TABLE I. Spearman correlation coefficients, pa, and Benjamini-Hochberg q-values for caloric input Cin, caloric output Cout, 
and caloric ratio Crat = Cout/Cin and demographic, data related to food and physical activity. Big Five personality traits | 31| . 
health and well-being rankings by state, and socioeconomic status, correlated, ordered from strongest to weakest Spearman 
correlations with caloric ratio. The two breaks in the table indicate significance levels of 0.01 and 0.05 for the Benjamini- 
Hochberg q of Crat, corresponding to the first 24 health and/or well-being quantities and then the next four, numbers 25 to 28. 
The bottom 9 quantities were not significantly correlated with Crat according to our tests. Tabs. [STj |S2[ and |S3| present the 
same analysis for caloric measures including phrases representing liquids, and for the difference Cdiff(Q!) = ceCout — (1 — a)Cin, 
both without and with liquids included. 


sumption rates all correlated significantly with all three 
of our measures. The variables relating to presence of 
produce in the state (percent of cropland in each state 
harvested for fruits and vegetables (#33), percent of cen¬ 
sus tracts with a healthy food retailer within one-half 
mile (#35), and percent of schools offering fruits and 
vegetables at celebrations (#31)) were significantly cor¬ 


related with Cin but were not correlated with Cout or 
Crat • Variables relating to local food (number of farmers 
markets per 100,000 people (#28) and Strolling of the 
Heifers locavore score (#29)) were not significantly cor¬ 
related with Cin, but were significantly correlated with 
Cont¬ 


our health and well-being ranking variables included 
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the CNBC quality of life ranking (#5), Gallup Wellbe¬ 
ing ranking (#9), America’s Health Ranking overall state 
rank (#10), life expectancy ranking (#11), Brain Health 
ranking (#20), Gini index score (#23), and George 
Mason’s overall freedom ranking (#36). Caloric ratio 
correlated with all of these variables except for George 
Mason’s freedom ranking (which did not correlate with 
any of our three measures). Gout correlated significantly 
with all of these measures except for the Brain Health 
ranking and the freedom ranking, caloric input Gin did 
not correlate significantly with the CNBG quality of life 
ranking, Gini index score, or freedom ranking. 

Regarding correlations with the Big Five personality 
traits, Pesta et al. noted that “Neuroticism...emerged as 
the only consistent Big Five predictor of epidemiologic 
outcomes (e.g., rates of heart disease or high blood pres¬ 
sure) and health-related behaviors (e.g., rates of smoking 
or exercise)” [36]. Additionally, “neuroticism correlates 
with many health-related variables, including depression 
and anxiety disorders, mortality, coping skill, death from 
cardiovascular disease, and whether one smokes tobac¬ 
co” [36|- Here, in keeping with these observations, we 
found that neuroticism (#25) was indeed the only Big 
Five personality trait that correlated significantly and 
negatively with caloric ratio. 

We also tested our three measures against two mea¬ 
sures of socioeconomic status—median income (#17) and 
percent of state with a bachelor’s degree or higher level 
of education (#21)—and found these correlations were 
significant for all three of our measures. 


III. CONCLUDING REMARKS 

Our Lexicocalorimeter has thus, when applied to Twit¬ 
ter, proved to find and demonstrate a range of strong, 
commonsensical patterns and correlations for the con¬ 
tiguous US. We invite the reader to explore our online 
instrument, a screenshot of which is shown in Fig.[^ 

Given the complex relationships between health, well¬ 
being, happiness, and various measures of socioeconomic 
status, it is rather difficult to say that we are only mea¬ 
suring health or only measuring well-being. We are also 
measuring socioeconomic status to some extent. Howev¬ 
er, the correlations between caloric ratio and measures of 
socioeconomic status are not as strong as the correlation 
of caloric ratio with many of the other measures. Given 
the above, we believe that the caloric content of tweets 
can be used successfully, along with other well-being and 
quality of life measures, to help gauge overall well-being 
in a population. 

There are many potential forward directions. A 
promising avenue is to incorporate tunability to the Lexi¬ 
cocalorimeter by manipulating its dynamic range. While 
we chose the caloric ratio Grat for its generality in the 
main body of this work, there is more flexibility in the 
measurement of caloric difference: Gdiff(a) = aGout ~ 
(1 — a)Gin. Though a universal approach is unclear (a 


should be independent of the particular data set being 
studied), we may profit from the versatility of Gdiff(Q:) 
when focusing on a single demographic. For example, 
if we are interested in diabetes rates, we could tune the 
instrument to obtain the best correlation with known lev¬ 
els, and thereby create a real-time estimator. To do so, 
we would tune a and find the value that gives the highest 
correlation between Gdiff(Q;) and diabetes rates for a giv¬ 
en set of populations. Of course, we could use a “black 
box” method to generate a more optimal fit, but in bas¬ 
ing our instrument on food and activity words, we have 
a far more principled approach that grants us the oppor¬ 
tunity not just to mimic but to understand and explain 
patterns that we find. In particular, our word shifts will 
be of great use in showing why our hypothetical estimate 
of diabetes is varying across populations. 

We fully recognize that the Twitter population is not 
the same as the general population; Twitter users differ 
from the general population in terms of race, age, and 
urbanity |7] . However, we currently have no reliable way 
to know, for example, the true age, race, gender, and 
education level of individual users and as such, are not 
able to adjust for these factors. While we were able to 
vet our food and physical activity lists to some extent 
(as described in Methods and Materials), we could not 
realistically go through every tweet to be certain that 
the phrase was being used in the way that we thought. 
We realize that even if the phrases are being used as we 
imagine, it does not necessarily mean that the person 
who tweeted actually performed the physical activity or 
ate the tweeted-about food (West et al. address a sim¬ 
ilar issue in inferring food consumption from accessing 
recipes online [T8]~l. 

We also currently do not know at what point our met¬ 
ric breaks down at smaller time scales (e.g., months or 
weeks) or for smaller spatial regions (e.g., city or county) 
level. Our preliminary research shows that the physical 
activity metric on its own may be quite effective at the 
city level, but the food measure may not be accurate on 
a smaller scale. We have also found the physical activi¬ 
ty list to be robust to random partitioning EZ], whereas 
the food list was not. We believe that these preliminary 
findings may be due to several factors: (a) the size of the 
food list (just over 1400 phrases) is much smaller than 
the physical activity phrase list (just over 13,400 phras¬ 
es); (b) there are generally more tweets about physical 
activities in our list than the foods in our food list; and 
(c) the amount of data within a city may not be a large 
enough sample for any food-based Twitter metric. We 
note that we have not tried using the metric on counties 
or Census block or tract groups, and it may be that these 
are more conducive to the metric. 

We propose to use crowdsourcing as a way to build a 
more comprehensive food phrase list that includes com¬ 
monly eaten foods with brand names as well as food slang 
that we did not capture here. Ideally, we would arrive 
at a food phrase database similar in scale to that of our 
existing physical activity phrase list. However we move 
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Why Vermont consumes more calories on average: 

Average US calories = 267.92 
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Why Vermont expends more calories on average: 

Average US caloric expenditure = 176.60 

Vermont caloric expenditure = 203.22 (Rank 3 out of 49) 
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FIG. 8. Screenshot of the interactive dashboard for our prototype Lexicocalorimeter site (taken 2015/07/03). An archived 
development version can be found as part of our paper’s Online Appendices at http://compstorylab.org/share/papers/ 
alajajian2015a/maps .html, and a full dynamic implementation will be part of our Panometer project at http://panometer. 
org/instruments/lexicocalorimeter, See https://github.com/andyreagan/lexicocalorimeter-appendix for source code. 


forward, we believe it is clear that the Lexicocalorime- instruments/lexicocalorimeter, We have drawn on 

ter we have designed and implemented is already of Twitter’s Gardenhose API which has been provided to 

some potency and may be improved substantively in the the Computational Story Lab by Twitter, 
future. 


IV. METHODS AND MATERIALS 

In order to attempt to estimate the “caloric content” 
of text-extracted phrases m relating to food (caloric 
input) and physical activity (caloric output), we needed 
comprehensive lists of foods and physical activities and 
their respective caloric content and expenditure informa¬ 
tion. Here, we explain in detail how we constructed these 
phrase lists and assigned calories to each phrase. 

In dataset SI (https://dx.doi.org/10.6084/m9. 
figshare.4530965.vl), we provide message IDs for all 
tweets that are part of our study, and we make both this 
dataset and other material and visualizations available at 
the paper’s Online Appendices (http://compstorylab. 
org/share/papers/alajajicin2015a/, and as part of 
our Panometer project at http://panometer.org/ 


A. Calorie estimates for phrases 

We used the USDA National Nutrient Database [38] to 
approximate the caloric content of foods, and the Com¬ 
pendium of Physical Activities from Arizona State Uni¬ 
versity and the National Cancer Institute [31] to approx¬ 
imate average Metabolic Equivalent of Tasks (METs) 
for physical activities, which we converted to calories 
expended per hour of activity [31]. Because the foods 
listed in the USDA National Nutrient Database are not 
described in a way that people talk about food, we creat¬ 
ed a list of food phrases used on Twitter by starting with 
a kernel of basic food terms from the USDA’s MyPlate 
website’s food group pages [40]. If the food phrase was 
not specific, such as “cereal”, we chose the most popular 
version of that food in the United States via an informal 
Google search at the time of the study (in this instance. 































































15 


Cheerios). If a brand name food was not in the USDA 
National Nutrient Database, we chose the closest match 
we could find. (Please note that this means that data in 
appendix may be inaccurate when searching brand name 
items.) 

This approach yielded examples of foods in the food 
groups of fruits, vegetables, grains, proteins, dairy, oils, 
solid fats, and “empty calories” (e.g., junk food), and 
built up a list of nearly 1400 food phrases used on Twit¬ 
ter. For the main results we present in this study, we did 
not include drinks or soups (liquids) in our list. We found 
there is very little change in our findings when liquids are 
included, as we discuss below, and we have omitted them 
at present both for simplicity and because we were not 
satisfied with a straightforward way of balancing liquid 
and solid nutrition estimates. Note that we have includ¬ 
ed ice creams, oils, and some other items that may act 
as liquids, and these could be separated out for future 
versions of our instrument. 

For physical activity, we used the physical activities 
listed in the Compendium to build up a list of nearly 
14,000 physical activity phrases used on Twitter. The 
order of magnitude of difference between the length of the 
two lists exists because of the difference in the number of 
terms that went into creating each list and the rates at 
which people tweet about foods vs. physical activities. 


B. Phrase extraction 

A major obstacle to the development of the food and 
physical activity lists is the determination of those phras¬ 
es used by individuals that most accurately represent a 
food or physical activity. Various methods exist which 
may help one ascertain information about the frequency 
of usage of higher-order lexical units m- However, we 
require one that not only determines reasonable estimates 
of frequency of usage, but further, does so with nuance 
regarding context. For example, one should not count the 
phrase “apple” as having occurred if it appeared within a 
larger phrase that was recognized as meaningful, such as 
“you’re the apple of my eye.” To accomplish these goals, 
we define a low-assumption text segmentation algorithm, 
which we refer to as serial partitioning. 

Serial text partitioning is a greedy algorithm (see 
Alg.[^ for finding distinct, coherent subsequences (phras¬ 
es) within a sequence (clause). It relies on the direction¬ 
ality of a sequence, and so is particularly adept for pro¬ 
cessing text into multi-word expressions for many modern 
languages. The algorithm relies on an objective function, 
which we will generally refer to as £. At a high level, 
the algorithm seeks to hnd find the largest subsequences 
possible, following a chain of optimizing, growing subse¬ 
quences. 

In the context of this article, we define C relative to a 
text T as follows, providing pseudocode below. First, let 
f : S be the random partition frequency function 

m under the pure random partition probability {q = 


D for the text T. We then apply the model of context 
developed in |41j under the parameterization q = 1, so 
that a given phrase s is a member of i{s) contexts Cg (e.g., 
the phrase s = {New, York, City) is a member of three 
contexts, labeled Cg = {{*, York, City), {New,*,City), 
and {New, York,*)}). Then for C G Cg, we consider the 
context-local likelihood probabilities: 

P(. I c) . (7) 

tec 

and prescribe to s the likelihood-minimizing context 

Cg = argmin(P(s | C)), (8) 

CeCs 

which chooses the context-pattern that is most prevalent 
in T. The objective function for this instantiation of 
serial partitioning is then defined as 

£{s) = P{s I C,), (9) 

and referred to as the local likelihood of a phrase s. 


Algorithm 1 Serial text partitioning of a (left-to-right) 
directional clause, given an objective function L : S ^ 
IR-° (whose maximization is desired, in this case) that 
is zero on the empty phrase (•), and a clause t = 
(ti,--- ,tf(t)), consisting of £{t) words. Note that for 
any a,b G S, a'~'b = {ai,--- , a^(a)! ^i; • ’' ^£(6)) denotes 
the concatenation of phrases, and that for convenience, a 
single sequence element, Oi, may be treated as sequence 
of one term, (a^). 

1: procedure SerialTextPartitioning)!) 

2: V -ir- (•) l> init. the partition. 

3: s (•) 0 init. the phrase. 

4: for i G {!,■ ■ ■ , £{t)) do 

5: if C{s'~'ti) > jC{s) then 

6: S — S ti 

7: else 

8 : V^V^S 

9: S ^— ti 

10: return V 


We manually applied the following criteria for con¬ 
structing both food and exercise phrase lists. For a 
phrase to be included, it had to be a phrase that used the 
food or physical activity word(s) in a way that pertained 
to eating or physical activity; we excluded phrases that 
were part of hashtags, Twitter user names, song lyrics, 
or names of organizations or businesses, and phrases that 
appeared four or fewer times were not included. Mis¬ 
spellings and alternate spellings were included if we hap¬ 
pened upon them (for example, “mash potatoes” instead 
of “mashed potatoes”), but we did not go out of our way 
to search for them. We queried questionable phrases to 
be sure that the majority of their uses were referring to 
the item of interest. Because we were building up from 
a small list, some specific versions of foods were included 
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while more general forms were not. For example, because 
we built phrases up from “strawberry,” “strawberry jam” 
was included while we did not conduct a larger search for 
“jam”. In another example, in building phrases up from 
“bacon,” “bacon wrapped dates” turned up so we includ¬ 
ed those dates but did not conduct a larger search for all 
possible “dates”. (Note: We removed the physical activi¬ 
ties category ‘sexual activity’ from the study because the 
task of determining meaning and context was too diffi¬ 
cult.) 

We searched for phrases containing the physical activ¬ 
ities in multiple tenses in order to capture as much infor¬ 
mation as possible. For example, for the activity type 
shoveling snow, we searched for the forms of shovel, shov¬ 
eling, and shoveled. Tweets were initially converted to all 
lowercase text, so we were assured that we were not miss¬ 
ing data due to capitalization. To match each food phrase 
with its closest caloric data, we found the most closely 
corresponding food from the USDA National Nutrient 
Database, counting all vegetables and fruits in their raw 
form unless the phrase indicated otherwise. Similarly, we 
entered meats as roasted or cooked with dry heat, not 
fried, unless the phrase indicated otherwise or there was 
no homemade option. We used the nutrition content of 
homemade versions of foods (for example, baked goods) 
rather than store-bought foods unless the phrase indicat¬ 
ed otherwise. Our approach, while systematic, was not 
exhaustive, nor is it the only way of taking on this chal¬ 


lenge; there are certainly other methods that we expect 
to yield similar results. 

Finally, we lemmatized the food phrases by their code 
in the USDA National Nutrient Database. If there were 
food phrases that were more general in each set of phrases 
that held the same code, we used the more general phrase 
as the lemma. 

We lemmatized the activity phrases by their METs and 
activity category. Activity categories were largely the 
same as listed in the Compendium with slight changes 
due to items in Compendium being listed in a Miscella¬ 
neous category, etc. This yielded instances of physical 
activity phrases that were in the same activity catego¬ 
ry but were very different with the same METs being 
included in the same lemma. From this level of lemmati- 
zation, we then used our best judgement to break these 
lemmas down further until proper phrases were included 
in each lemma. 
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FIG. SI. Plots for the contiguous US showing the relationships Crat versus Gin (left), and Grat versus Gout (right). With its 
larger range, caloric output Gout is more tightly coupled with the ratio Grat. 
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Minnesota, 6 
Iowa, 21 
California, 12 
New York, 10 
Nebraska, 15 
Washington, 11 
Maine, 5 
New Hampshire, 8 
North Dakota, 27 

1 

New Mexico, 22 

1 

Arizona, 14 

_1 

Idaho, 18 


Oklahoma, 26 

□ 

Wisconsin, 13 

□ 

□ 

Rhode Island, 23 

□ 

Massachusetts, 20 

□ 

Nevada, 16 

□ 

Florida, 19 


I MO, 28 
a DC, 29 
D VA, 30 

□ IL, 31 

□ ok, 32 

□ CT, 33 

□ PA, 34 

□ TX, 35 

□ NJ, 36 

□ Ky37 

□ TN, 38 
I—I NC, 39 

SC, 40 
W\/, 41 
Ml, 42 
GA, 43 
AL 44 
Mb, 45 
AR, 46 
DE, 47 
L/V 48 
MS, 49 


T 


Activity 


□ 


I Kansas, 30 


Missouri, 24 □ 

District of Columbia, 17 □□ 

Virginia, 29 | 

Illinois, 28 I 

I - 1 Ohio, 41 

□ Connecticut, 31 

□ Pennsylvania, 32 
I I Texas, 36 

□ New Jersey, 35 
I I Kentucky, 43 

□ Tennessee, 33 

I I North Carolina, 38 

I I South Carolina, 37 

□ West Virginia, 46 
~ Michigan, 42 


C 


C 


I I Georgia, 40 
I Alabama, 47 
I I Maryland, 39 
I Arkansas, 45 
I Delaware, 44 
Louisiana, 48 


] Mississippi, 49 


30 


- 0.12 


I 

Ratio 


0.13 


FIG. S3. Histograms as per Fig. 
choropleth maps in Figs. m and|3 


f 


5] with states sorted by activity rank. The bar colors correspond those used in for the 






















































































































S4 


Four views of food phrase shifts for Colorado 

A. High calorie foods mentioned more: B. Low calorie foods mentioned less: 


a 

cS 

!h 

o 

o 


cS 

O 

O 


Z+1 1 1 Z+T 


Z+1 Z+T 

I-t Z-1 


Z-T 1 1 Z-1 

I 


z 

mis. bacon-ht 


Do. chicken-J, 

D7. olive oil-1-T 


Do. shrimp-J, 

D 23 . almonds-|-t 


Di 3. crab-J, 

D 24 . pistachios-t-f 


Di 5. ice cream-J, 

I 34 . girl scout cookie-t-t 


Di 8. pineapple-J, 

I 47 . candy bar-t-f 


D 28 . mango-J, 

I 52 . hard candy+t 


I3I. catfish- J, 

I7O. wainuts-l-t 


I 32 . corn-J, 

I7I. onion rings+t 


I 33 . oranges-J, 

I 72 . coffee cake-1-1 

s=l 

11 37. applesauce-J, 

I77. cheeseburger-t-f 


I 40 . broccoli-J, 

I9I. parmesan cheese-t-f 

xJ 

0 

I 44 . oatmeal-J, 

194. falafel+t 

0 

I 45 . banana pudding-J, 

I97. Italian sausage-tf 


|46. mac and cheese- J, 

|98. popeyes chicken-t-f 


|48. strawberries-J, 

llOO. oatmeal raisin cookie-pf 


Iso. sweet potato-J, 

Il03. glazed donut+f 


Isi. chicken salad-J, 

Il05. cookie dough-ht 


I57. grits-J, 

IllO. beef jerky+t 


I 58 . collards-J, 

I 115 . banana chips+f 


|60. beef-1 

I 119 . cream cheese+f 


|65. macaroni-J, 

I 13 I. rice cakes+t 


|82. raspberry-J, 

Il32. nea.nut brittle-tt 


|83. salmon-1 

-5 0 5 


-5 0 5 

Per food phrase caloric shift 


Per food phrase caloric shift 

High calorie foods mentioned less: 

D. 

Low calorie foods mentioned more: 

Z+i 1 1 Z+T 


Z+1 Z+T 

Z-t Z-1 


Z-T 1 1 Z-1 

Z 


Z 

2 . chocolate ca,ndy4-11 1 


1 . noodles-ll 1 

4. cake-hlD 


8. pasta-fD 

5. cookies-hlD 


10. apples-"I'D 

22. donuts-|-J,D 


11. cucumber-"I'D 

25. cheese-hll 


12. egg-tD 

27. butter-1-11 


14. tomato-TD 

29. cake with frosting+J,l 


16. peaches-TD 

35. peanut butter-hi 1 


17. turkey-TD 

38. mayonnaise+J.1 


19. onion-"I'll 

39. popcorn+fl 

a 

20. cabbage-fll 

53. crackers-hll 


21. pear-"I'D 

61. potato chips-hll 

xJ 

0 

26. grapes-Tl 

63. pecans-hll 

0 

30. asparagus-TD 

64. coconut oil-hll 


36. carrot-Tl 

68. corn chips-hll 


41. greek yogurt-fl 

69. chocolate cake+J,l 


42. green pepper-fl 

87. bacon fat-hi 1 


43. spinach-fl 

88. cashews-hll 


49. frozen yogurt-tl 

89. cheese puffs-hll 


54. brussels sprouts-fl 

90. apple jacks-hll 


55. celery-tl 

92. sunflower seeds-hll 


56. spaghetti-tl 

116. peanuts+fl 


59. kale-fl 

120. nita chins+ 11 


62. finunder-tl 


-5 0 5 

Per food phrase caloric shift 


-5 0 5 

Per food phrase caloric shift 


FIG. S4. Food phrase shifts for Colorado, broken down into the four ways phrases may contribute to a shift. See Fig. for 
the combined shift. See Subsec. Phrase Shifts in Sec. Analysis and Results for an explanation of phrase shifts. 
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S5 


Four views of food phrase shifts for Mississippi 

A. High calorie foods mentioned more: B. Low calorie foods mentioned less: 


Z+1 1 1z+t 


Z+i Z+T 

I-t I-l 

Z-t 1 1 Z-i 

z 

z 

1 ll. cake+I 


1 l.d. pasta,-1 

1 I 2 . mokies+'l' 


1 Ig. banana-1 

1 111. sausage+t 


1 I 12 . crah-| 

1 I 2 I. mayonnaise+l 


1_II 6 . apples-J, 

1 I 27 . chocolate candy+f 


1 II 8 . mango-J, 

dish, pecan pie+f 


1 l2n. onion-J, 

IZI46. peanuts+t 


1 I 24 . turkey-J, 

n47. apple jacks+f 


1 l2G. broccoli-J, 

DhS. cake with frosting+t 


Id 30. spinach-J, 

DhG. crackers+t 

a 

d32. cucumber-J, 

DgI. mixed nuts+t 

u 

dl33. carrot-]. 

D 71 . cheese puffs+t 

0 

IIII 37 . lobster-J, 

DSS. cheese stick+t 

0 

IIII 38 . tomato-J, 

Di 04. pecans+t 

MH 

IZI 4 I. corn-i 

DiOG. corn flakes+t 


□ 42 . eggnog-I 

Di 14. chicken tenders+f 


IZI 44 . frozen yogurt-J. 

D 12 O. cheese grits+t 


D 45 . avocado-J. 

D 123 . turkey bacon+t 


048. brussels sprouts-J, 

D 127 . fried chicken+f 


049. blueberry-], 

Di 28. butter+l 


O 5 I. oranges-J, 

Di 30. popeyes chicken+f 


O 53 . raspberry-J, 

D 134 . Cheddar cheesed-1 


058. celery-f 

_D 135 . little debbie cakes-fT_ 


059. tofu-l 


-i d i -i d i 

Per food phrase caloric shift Per food phrase caloric shift 


High calorie foods mentioned less: D. Low calorie foods mentioned more: 


Z+1 1 1 Z+T 


Z+i Z+T 

Z-T Z-i 

Z-T 1 1 Z-i 

Z 

Z 

13. olive oil+|l 1 


3 . shrimp-'I'l 1 

15. bacon-k|l 1 


4. pineapple-fl 1 

28. donuts+],dl 


7. catfish-'I'l 1 

34. girl scout cookie-hi d 


8. mashed pota.toes-f 1 1 

43. cookie dough-|-J,OI 


9. grits-tl 1 

52. pastry-t-fO 


10. chicken-'I'l 1 

54. popcorn-|-J,0 


14. peaches-fl 1 

G2. candy bar-tiD 


17. cabha.ge-f 1 1 

72. hard candy+1,0 


19. sweet potato-'I'dO 

81. peanut butter-t-fD 

a 

22. banana pudding-f 00 

84. onion rings-t-fD 

;h 

23. pear-fOO 

88. pistachios-hiD 

XJ 

0 

25. ice cream-'ll 1 

92. cheesecake-hiD 

0 

29. king crab-7 IZO 

95. cream cheese-t-fD 

MH 

31. spaghetti-fd 

9G. breadsticks-hiD 


35. green beans-fd 

97. potato chips-t-fD 


39. pork chop-tlO 

100. cheese-t-fD 


40. okra-flO 

103. sugar cookie-t-jD 


50. green tomatoes-f 0 

105. shortcake-kiD 


57. snapper-fO 

107. bacon fat-|-J.D 


G4. fried rice-fD 

117. goat cheese+fD 


G7. chicken salad-fD 

122. cheeseburger-|-J,D 


G9. tuna-fD 

125. niimnkin seeds+ I ll 


70. strawberries-tfl 


-i d i -1 d i 

Per food phrase caloric shift Per food phrase caloric shift 


FIG. S5. Food phrase shifts for Mississippi, broken down into the four ways phrases may contribute to a shift. See Fig. 
for the combined shift. See Subsec. Phrase Shifts in Sec. Analysis and Results for an explanation of phrase shifts. 
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S6 


Four views of activity phrase shifts for Colorado 

A. High calorie activities mentioned more: B. Low calorie activities mentioned less: 


Z+i ! 1Z+T 


Z+T Z+T 

Z-t Z-T 

Z-T nz-T 

Z 

Z 

1 ll. running-bt 


Do. eating-J, 

1 l2. skiing-bt 


Ds. laying down-J, 

□ 3. hiking+t 


Dll. watching tv/movies-J, 

Dd. snowboarding-bt 


Di 2. talking on phone-J. 

Ds. biking+t 


I 16 . playing video games-J, 

D7. mountain biking-bf 


I 18 . showering-J. 

Do. white water rafting-bf 


I 22 . getting my hair done-J, 

DlO. rock climbing-bt 


I 30 . getting my nails done-J, 

Di 3. sledding-bt 

03 

I 34 . attending church-J 

Di 4. ice skating+t 


IdO. boating-J 

Il7. walking-bt 


|50. typing-J 

Il9. jazzercise-bt 

• 1^ 

|68. watching tv or movie...-J 

I 20 . using treadmill+l 

<1 

I 77 . wrapping presents-J 

I 2 I. scuba diving-bt 


|88. washing dishes-J 

I 23 . bowling-bt 


I 1 O 6 . walking leisurely-J 

I 24 . mountain climbing+'f 


!133. getting my hair and...-J 

I 27 . golfing-bt 


!140. taking medicine-J 

I 29 . doing yoga-bt 


!145. sitting and listening-J 

I 3 I. swimming-bt 


! 147. sitting on a toilet-J 

I 32 . doing the cooking da...-bt 


!151. watching child-J 

I 35 . backpacking+l 


!153. brushing my teeth-J 

I 37 . rafting-bt 


!160. parasailing -j 

1.38. line dancintr-l-t 


h62. bird watchinf-l 


-5 0 5 

Per activity phrase caloric expenditure shift 


-5 0 5 

Per activity phrase caloric expenditure shift 


High calorie activities mentioned less: D. Low calorie activities mentioned more: 


i+iD 

z-t 


25. cooking+J,l 
28. playing basketball-1-J. I 
33. pole dancing-bj, I 
39. cleaning-hil 
41. playing football+jl 
42. playing dodgeball+J,l 

43. jumping jacks+J.1 

44. doing pushups+J.1 
51. doing my hair-bj.1 

52. walking quickly-|-J,l 
53. hunting-bil 
57. mowing grass-|-J.I 
61. aerobics-bil 
65. kayaking-1-J.I 
67. table dancing-|-J.I 
69. picking fruit-|-J.I 
73. cleaning vehicles-!-J.I 
79. drag racing-!-J.I 
83. walking briskly-!-J,l 
87. getting dressed-!-J,| 

95. praise dancing-bj, I 

96. running uphill-bj,l 

97. square dancing+ 11 


Z-i 


Z+T 

z 



a 

cS 


• rH 

o 

< 1 ^ 


15. reading-fD 
26. writing-'!'I 
36. sitting-fl 
46. knitting-11 
89. online shopping-'j'I 
109. crocheting-'I'I 
111. snuggling or petting...-'!'I 
114. arts and crafts-'fl 
125. standing-'!'! 
138. meditating-'!'! 
152. attending a family r...-'!'! 
155. finger painting-'f! 
166. drawing-1! 
183. watching sports in p...-t! 

194. shaving-f! 


-5 0 5 

Per activity phrase caloric expenditure shift 


-5 0 5 

Per activity phrase caloric expenditure shift 


FIG. S6. Activity phrase shifts for Colorado, broken down into the four ways phrases may contribute to a shift. See Fig. 
for the combined shift. See Subsec. Phrase Shifts in Sec. Analysis and Results for an explanation of phrase shifts. 















Activity rank P Activity rank 


Four views of activity phrase shifts for Mississippi 

A. High calorie activities mentioned more: B. Low calorie activities mentioned less: 


S7 


I+i □ Z+t 

I-T I-i 

Z 

□ 4. cooking+t 
ll9. playing football+t 
I 23 . cleaning+t 
I 3 I. weight lifting+t 
135. fishing+t 

1 42 . doing pushups+t 

1 43 . hunting+t 
I 49 . doing situps+t 

IgO. doing the cooking da...+'|' 
|67. mopping+t 
175. playing dodgeball+t 

177. doing my hair+f 

178. table dancing+f 

179. moving furniture+t 
184. cleaning yard+'f 

189. power lifting+t 

1 92 . doing the safety dance+t 

193. jumping jacks+'1' 

194. body building+f 

197. getting dressed+t 

I 105 . doing the chicken da...+'f 

1109. playing disc golf+t 

_1112 . bass fishing+t _ 

-4-2 024 

Per activity phrase caloric expenditure shift 


Z+1 Z+T 

Z-T □ Z-1 

z _ 

Ds. watching tv/movies-J, 

I 22 . getting my nails done-J, 

126. reading-], 

127. writing-] 

133. boating-] 

138. standing-] 

|41. typing-] 

153. playing video games-] 

159. wrapping presents-] 

165. knitting-] 

|69. meditating-] 

I 7 I. watching child-] 

1106. walking leisurely-] 

1107. shaving-] 

1125. snuggling or petting...-] 

1129. crocheting-] 

1130. drawing-] 

1140. playing guitar-] 

1153. parasailing-] 

1157. bird watching-] 

1165. arts and crafts-] 

1171. sitting on a toilet-] 

_ I 2 OO. watching snorts in 0 ...-I 

-4-2 0 2 4 

Per activity phrase caloric expenditure shift 


a 

cS 

U 


o 

< 


High calorie activities mentioned less: D. Low calorie activities mentioned more: 


Z+i1 1 Z+T 


Z+1 Z+T 

Z-T Z-i 

Z-T 1 1 Z-1 

z 

Z 

1. running+]l_1 


3 . eating-] n 

2. dancing+]IIII 


6. laying down-]D 

7. walking-!-]D 


13. attending church-]! 

8. biking+]D 


14. talking on phone-]! 

9. ice skating+]D 


15. sitting-]! 

10. using treadmill-}-]D 


16. getting my hair done-]! 

11. swimming+]D 


30. showering-]! 

12. hiking-}-] Q 

17. boxing-}-]! 

a 

47. watching tv or movie...-]! 

56. washing dishes-]! 

18. bowling-b]! 


74. playing video games-]! 

20. golfing-)-]! 


99. playing games-]! 

21. sledding-H]! 


110. sitting and listening-]! 

24. skiing-}-]! 


137. ironing-]! 

25. snowboarding-}-]! 

<! 

176. attending a family r...-]! 

28. mountain biking+]! 



29. jogging-}-]! 



32. shopping-}-]! 



34. mowing grass+]! 



36. rock climbing-}-] 1 



37. roller skating-}-]! 



39. doing yoga-}-]! 



40. shoveling-}-]! 



44. nlavinp- ba,,sketba,11+ !! 




-4-2 024 .4_2 024 

Per activity phrase caloric expenditure shift Per activity phrase caloric expenditure shift 


FIG. S7. Activity phrase shifts for Mississippi, broken down into the four ways phrases may contribute to a shift. See Fig.[^ 
for the combined shift. See Subsec. Phrase Shifts in Sec. Analysis and Results for an explanation of phrase shifts. 

















S8 


Health and/or well-being quantity 

ps for 

Crat 

g-val 

pa for 

Cin 

g-val 

ps for 

Gout 

g-val 

1. % no physical activity in past 30 days |24| 

-0.78 

3.07 X 10“°® 

0.58 

4.91 X 10-°® 

-0.66 

1.59 X 10-®° 

2. % have been physically active in past 30 days [24] 

0.78 

3.07 X 10“°® 

-0.58 

5.50 X 10-°® 

0.67 

1.31 X 10-®° 

3. % high blood pressure |24| 

-0.77 

3.07 X 10“°® 

0.39 

1.16 X 10-®® 

-0.78 

3.07 X 10-®° 

4. Heart disease death rate |27| 

-0.75 

1.02 X lO-o® 

0.38 

1.24 X 10-®® 

-0.73 

2.07 X 10-°® 

5. Adult diabetes rate |25| 

-0.74 

1.17 X 10“®® 

0.34 

2.77 X 10-°® 

-0.77 

3.07 X 10-®° 

6 . CNBC quality of life ranking |26| 

-0.74 

1.87 X 10“°® 

0.33 

3.22 X 10-®® 

-0.77 

3.60 X 10-®° 

7. % adult overweight/obesity [27| 

-0.71 

1.33 X 10-°'^ 

0.53 

3.14 X 10-®'® 

-0.59 

3.56 X 10-°® 

8. Gallup Wellbeing score [1] 

0.7 

3.17 X 10“°'^ 

-0.33 

3.38 X 10-°® 

0.73 

4.35 X 10-°® 

9. % adult obesity |25| 

-0.69 

3.10 X 10-®'^ 

0.52 

4.11 X 10-°® 

-0.59 

3.56 X 10-®5 

10. America’s Health Rankings, overall [24| 

-0.69 

1.31 X 10“®® 

0.4 

9.14 X 10-°® 

-0.67 

2.65 X 10-®° 

11. Life expectancy at birth |27| 

0.67 

7.92 X 10“®'^ 

-0.36 

1.59 X 10-°® 

0.65 

2.58 X 10-®° 

12. % child overweight/obesity [27] 

-0.65 

2.58 X 10“®® 

0.34 

2.82 X 10-°® 

-0.64 

3.06 X 10-®° 

13. % who eat fruit less than once a day |28| 

-0.65 

2.58 X 10“®® 

0.57 

7.45 X 10-°® 

-0.51 

5.89 X 10-°® 

14. % who eat vegetables less than once a day |28| 

-0.61 

1.32 X 10-®® 

0.53 

3.14 X lO-o® 

-0.46 

1.72 X 10-°® 

15. Median daily intake of fruits [28| 

0.59 

3.56 X 10"®® 

-0.59 

3.56 X 10-°® 

0.41 

5.73 X 10-°® 

16. Smoking rate |27| 

-0.59 

3.81 X 10-®® 

0.47 

1.60 X 10-°® 

-0.48 

1.24 X 10-°® 

17. Median daily intake of vegetables [28| 

0.5 

7.25 X 10-®"^ 

-0.56 

1.03 X 10-°® 

0.31 

4.09 X 10-°® 

18. Median household income |27| 

0.48 

1.37 X 10-®® 

-0.5 

8.58 X 10-°® 

0.4 

9.07 X 10-°® 

19. % high cholesterol [24] 

-0.48 

1.26 X 10-®® 

0.24 

1.16 X 10-°® 

-0.48 

1.05 X 10-°® 

20. Colorectal cancer rate [25| 

-0.47 

1.72 X 10-®® 

0.56 

1.37 X 10-°® 

-0.27 

8.35 X 10-°® 

21. Brain health ranking |29| (lower is better) 

-0.46 

1.95 X 10-®® 

0.55 

1.74 X 10-°® 

-0.29 

5.43 X 10-°® 

22. US Census Gini index score [3Q| (lower is better) 

-0.44 

3.60 X 10-®® 

0.11 

5.12 X 10-°® 

-0.5 

6.22 X 10-°® 

23. % with bachelor’s degree or higher 

0.42 

4.86 X 10-®® 

-0.43 

4.21 X 10-°® 

0.33 

2.82 X 10-°® 

24. Avg poor mental health days, past 30 days |24| 

-0.39 

9.87 X 10-®® 

0.1 

5.31 X 10-°® 

-0.48 

1.23 X 10-°® 

25. Neuroticism Big Five personality trait |31| 

-0.37 

1.33 X 10-®2 

0.23 

1.35 X 10-°® 

-0.37 

1.42 X 10-°® 

26. Binge drinking rate |24| 

0.34 

2.91 X 10-®® 

-0.12 

4.88 X 10-°® 

0.41 

6.23 X 10-°® 

27. Farmers markets per 100,000 in pop. [28] 

0.33 

2.96 X 10-®® 

-0.01 

9.59 X 10-°® 

0.42 

5.41 X 10-°® 

28. Extraversion Big Five personality trait |31| 

-0.33 

2.83 X 10-®® 

0.13 

4.13 X 10-°® 

-0.29 

5.36 X 10-°® 

29. Avg ^ poor physical health days, past 30 days |24| 

-0.32 

3.81 X 10-®® 

0.16 

3.32 X 10-°® 

-0.38 

1.16 X 10-°® 

30. Strolling of the Heifers locavore score (lower is better) [32| 

-0.31 

4.59 X 10-®® 

-0.16 

3.32 X 10-°® 

-0.45 

3.16 X 10-°® 

31. % schools offering fruit/veg at celebrations |28| 

0.25 

1.16 X 10-®® 

-0.38 

1.36 X 10-°® 

0.05 

7.75 X 10-°® 

32. Openness Big Five personality trait |31| 

0.23 

1.31 X 10-®® 

-0.42 

5.43 X 10-°® 

0.04 

7.95 X 10-°® 

33. % cropland harvested for fruits/veg |28| 

0.18 

2.53 X 10-®® 

-0.53 

2.90 X 10-°® 

-0.04 

7.95 X 10-°® 

34. Conscientiousness Big Five personality trait |31| 

-0.1 

5.31 X 10-®® 

0.14 

3.97 X 10-°® 

-0.05 

7.78 X 10-°® 

35. % census tracts, healthy food retailer within 1/2 mile [28| 

-0.06 

7.47 X 10-®® 

-0.39 

1.09 X 10-°® 

-0.24 

1.28 X 10-°® 

36. George Mason overall freedom ranking |33| (lower is freer) 

-0.02 

8.90 X 10-®® 

-0.05 

7.73 X 10-°® 

-0.1 

5.58 X 10-°® 

37. Agreeableness Big Five personality trait [31| 

0 

9.95 X 10-®® 

0.24 

1.26 X 10-°® 

0.08 

6.41 X 10-°® 


TABLE SI. Identical to Tab. [I] but with liquids included. Spearman correlation coefficients, pa, and Benjamini-Hochberg 
q-values for caloric input Cin, caloric output Cout, and caloric ratio Crat = Cout/Cin and demographic data related to food and 
physical activity, Big Five personality traits m, health and well-being rankings by state, and socioeconomic status, correlated, 
ordered from strongest to weakest Spearman correlations with caloric ratio. 














































S9 


Health and/or well-being quantity 

Ps for 
Cdiflf 

g-val 

pa for 

Cin 

g-val 

ps for 

Gout 

g-val 

1. % no physical activity in past 30 days |24| 

-0.79 

1.77 X 10“°® 

0.58 

5.67 X 10-°® 

-0.66 

1.51 X 10-®° 

2. % have been physically active in past 30 days [24] 

0.79 

1.77 X 10“°® 

-0.57 

6.53 X 10-°® 

0.67 

1.24 X 10-®° 

3. % high blood pressure |24| 

-0.78 

2.72 X 10“°® 

0.32 

4.05 X 10-®® 

-0.78 

2.72 X 10-®° 

4. Adult diabetes rate |25| 

-0.76 

5.26 X lO-o® 

0.29 

6.16 X 10-®® 

-0.77 

2.73 X 10-®° 

5. CNBC quality of life ranking |26| 

-0.75 

8.07 X 10“®® 

0.28 

7.34 X 10-°® 

-0.77 

3.60 X 10-®° 

6. % adult overweight/obesity |27| 

-0.73 

2.40 X 10“°® 

0.55 

1.41 X 10-®'® 

-0.59 

3.07 X 10-°® 

7. Heart disease death rate |27| 

-0.73 

2.07 X 10“°® 

0.34 

2.82 X 10-®® 

-0.73 

2.07 X 10-°® 

8. Gallup Wellbeing score [1] 

0.73 

3.83 X 10“°® 

-0.31 

4.43 X 10-°® 

0.73 

3.70 X 10-°® 

9. % adult obesity |25| 

-0.72 

3.70 X lO-o® 

0.53 

2.26 X 10-°® 

-0.59 

2.94 X 10-°® 

10. America’s Health Rankings, overall [24| 

-0.72 

3.93 X 10-®'^ 

0.43 

4.74 X 10-°® 

-0.67 

2.77 X 10-®° 

11. Life expectancy at birth |27| 

0.68 

4.27 X 10“®'^ 

-0.4 

6.91 X 10-°® 

0.65 

2.64 X 10-®° 

12. % who eat fruit less than once a day [28] 

-0.67 

9.44 X 10“®'^ 

0.61 

1.38 X 10-°® 

-0.51 

5.23 X 10-°® 

13. % child overweight/obesity [27] 

-0.64 

3.03 X 10“®® 

0.27 

7.55 X 10-°® 

-0.64 

3.06 X 10-®° 

14. % who eat vegetables less than once a day |28| 

-0.61 

1.38 X 10-®® 

0.51 

5.21 X 10-°® 

-0.46 

1.57 X 10-°® 

15. Median daily intake of fruits [28| 

0.6 

1.68 X 10“®® 

-0.62 

8.33 X 10-°® 

0.41 

5.44 X 10-°® 

16. Smoking rate |27| 

-0.6 

2.14 X 10-®® 

0.51 

5.19 X 10-°® 

-0.48 

1.08 X 10-°® 

17. Median household income [27| 

0.51 

5.19 X 10-®"^ 

-0.53 

3.27 X 10-°® 

0.4 

8.38 X 10-°® 

18. Median daily intake of vegetables |28| 

0.5 

5.72 X 10-®'^ 

-0.56 

7.44 X 10-°® 

0.31 

4.36 X 10-°® 

19. Brain health ranking |29| (lower is better) 

-0.5 

7.50 X 10-®'^ 

0.62 

1.38 X 10-°® 

-0.29 

5.70 X 10-°® 

20. % high cholesterol [24] 

-0.49 

7.88 X 10-®"^ 

0.23 

1.45 X 10-°® 

-0.48 

9.05 X 10-°® 

21. % with bachelor’s degree or higher [6] 

0.47 

1.48 X 10-®® 

-0.54 

1.66 X 10-°® 

0.33 

2.82 X 10-°® 

22. Colorectal cancer rate [25| 

-0.44 

3.82 X 10-®® 

0.53 

3.59 X 10-°® 

-0.27 

8.25 X 10-°® 

23. US Census Gini index score |3Q| (lower is better) 

-0.42 

4.99 X 10-®® 

-0.03 

8.45 X 10-°® 

-0.5 

5.55 X 10-°® 

24. Avg ^ poor mental health days, past 30 days |24| 

-0.42 

5.44 X 10-®® 

0.12 

4.75 X 10-°® 

-0.48 

1.06 X 10-°® 

25. Neuroticism Big Five personality trait |31| 

-0.38 

1.13 X 10-®2 

0.2 

2.03 X 10-°® 

-0.37 

1.42 X 10-°® 

26. Binge drinking rate |24| 

0.38 

1.32 X 10-®® 

-0.15 

3.56 X 10-°® 

0.41 

5.84 X 10-°® 

27. Avg ^ poor physical health days, past 30 days [24| 

-0.35 

2.34 X 10-®® 

0.19 

2.19 X 10-°® 

-0.38 

1.13 X 10-°® 

28. Farmers markets per 100,000 in pop. |28| 

0.33 

2.82 X 10-®® 

0.06 

7.17 X 10-°® 

0.42 

5.05 X 10-°® 

29. Strolling of the Heifers locavore score (lower is better) |32| 

-0.29 

6.44 X 10-®® 

-0.3 

5.41 X 10-°® 

-0.45 

2.94 X 10-°® 

30. Extraversion Big Five personality trait [31| 

-0.28 

6.89 X 10-®® 

0.03 

8.50 X 10-°® 

-0.29 

5.63 X 10-°® 

31. % schools offering fruit/veg at celebrations |28| 

0.24 

1.26 X 10-®® 

-0.46 

1.96 X 10-°® 

0.05 

7.90 X 10-°® 

32. Openness Big Five personality trait |31| 

0.24 

1.26 X 10-®® 

-0.5 

6.11 X 10-°® 

0.04 

8.10 X 10-°® 

33. % cropland harvested for fruits/veg |28| 

0.19 

2.35 X 10-®® 

-0.62 

1.37 X 10-°® 

-0.04 

8.10 X 10-°® 

34. Conscientiousness Big Five personality trait |31| 

-0.12 

4.62 X 10-®® 

0.2 

2.10 X 10-°® 

-0.05 

7.93 X 10-°® 

35. % census tracts, healthy food retailer within 1/2 mile [28| 

-0.02 

8.86 X 10-®® 

-0.52 

3.68 X 10-°® 

-0.24 

1.28 X 10-°® 

36. George Mason overall freedom ranking |33| (lower is freer) 

-0.02 

8.88 X 10-®® 

-0.11 

5.15 X 10-°® 

-0.1 

5.64 X 10-°® 

37. Agreeableness Big Five personality trait [31| 

-0.01 

9.42 X 10-®® 

0.22 

1.50 X 10-°® 

0.08 

6.47 X 10-°® 


TABLE S2. Identical to Tab. [I] but using a caloric difference rather than caloric ratio. Spearman correlation coefficients, pa, 
and Benjamini-Hochberg g-values for caloric input Cin, caloric output Cout, and caloric difference Cdis{ct) = oiCout + (1 — oi)Cin 
and demographic data related to food and physical activity, Big Five personality traits EH, health and well-being rankings by 
state, and socioeconomic status, correlated, ordered from strongest to weakest Spearman correlations with caloric ratio. We 
chose a so that the average of Cout matched the average of aCin. 














































SIO 


Health and/or well-being quantity 

Ps for 
CdiflF 

g-val 

pa for 

Cin 

g-val 

ps for 

Cout 

g-val 

1. % no physical activity in past 30 days |24| 

-0.78 

3.42 X 10“°® 

0.58 

4.91 X 10-°® 

-0.66 

1.59 X 10-®° 

2. % have been physically active in past 30 days [24] 

0.78 

3.42 X 10“°® 

-0.58 

5.50 X 10-°® 

0.67 

1.39 X 10-®° 

3. % high blood pressure |24| 

-0.77 

3.60 X 10“°® 

0.39 

1.16 X 10-®® 

-0.78 

3.42 X 10-®° 

4. Heart disease death rate |27| 

-0.75 

1.09 X lO-o® 

0.38 

1.24 X 10-®® 

-0.73 

2.07 X 10-°® 

5. Adult diabetes rate |25| 

-0.74 

1.25 X 10“®® 

0.34 

2.77 X 10-°® 

-0.77 

3.42 X 10-®° 

6. CNBC quality of life ranking |26| 

-0.74 

2.07 X 10“°® 

0.33 

3.22 X 10-®® 

-0.77 

3.60 X 10-®° 

7. % adult overweight/obesity [27| 

-0.7 

1.48 X 10“°'^ 

0.53 

3.14 X 10-®'® 

-0.59 

3.56 X 10-°® 

8. Gallup Wellbeing score [1] 

0.7 

3.08 X 10“°'^ 

-0.33 

3.38 X 10-°® 

0.73 

4.35 X 10-°® 

9. % adult obesity |25| 

-0.69 

3.40 X 10-®'^ 

0.52 

4.11 X 10-°® 

-0.59 

3.56 X 10-®5 

10. America’s Health Rankings, overall [24| 

-0.69 

1.39 X 10“®® 

0.4 

9.14 X 10-°® 

-0.67 

2.77 X 10-®° 

11. Life expectancy at birth |27| 

0.67 

9.05 X 10“®'^ 

-0.36 

1.59 X 10-°® 

0.65 

2.67 X 10-®° 

12. % who eat fruit less than once a day [28] 

-0.65 

2.67 X 10“®® 

0.57 

7.45 X 10-°® 

-0.51 

5.89 X 10-°® 

13. % child overweight/obesity [27] 

-0.64 

3.06 X 10"®® 

0.34 

2.78 X 10-°® 

-0.64 

3.06 X 10-®° 

14. % who eat vegetables less than once a day |28| 

-0.61 

1.54 X 10-®® 

0.53 

3.14 X 10-°® 

-0.46 

1.69 X 10-°® 

15. Median daily intake of fruits [28| 

0.59 

3.56 X 10"®® 

-0.59 

3.56 X 10-°® 

0.41 

5.73 X 10-°® 

16. Smoking rate |27| 

-0.59 

3.77 X 10-®® 

0.47 

1.60 X 10-°® 

-0.48 

1.24 X 10-°® 

17. Median daily intake of vegetables [28| 

0.5 

7.64 X 10-®"^ 

-0.56 

1.03 X 10-°® 

0.31 

4.09 X 10-°® 

18. Median household income |27| 

0.48 

1.38 X 10-®® 

-0.5 

8.58 X 10-°® 

0.4 

9.07 X 10-°® 

19. % high cholesterol [24] 

-0.48 

1.28 X 10-®® 

0.24 

1.15 X 10-°® 

-0.48 

1.05 X 10-°® 

20. Colorectal cancer rate [25| 

-0.47 

1.68 X 10-®® 

0.56 

1.37 X 10-°® 

-0.27 

8.35 X 10-°® 

21. Brain health ranking |29| (lower is better) 

-0.46 

1.91 X 10-®® 

0.55 

1.74 X 10-°® 

-0.29 

5.43 X 10-°® 

22. US Census Gini index score [3Q| (lower is better) 

-0.44 

3.41 X 10-®® 

0.11 

5.12 X 10-°® 

-0.5 

6.22 X 10-°® 

23. % with bachelor’s degree or higher 

0.42 

4.99 X 10-®® 

-0.43 

4.21 X 10-°® 

0.33 

2.78 X 10-°® 

24. Avg poor mental health days, past 30 days |24| 

-0.39 

1.05 X 10-®® 

0.1 

5.31 X 10-°® 

-0.48 

1.23 X 10-°® 

25. Neuroticism Big Five personality trait |31| 

-0.37 

1.30 X 10-®® 

0.23 

1.35 X 10-°® 

-0.37 

1.42 X 10-°® 

26. Extraversion Big Five personality trait |31| 

-0.34 

2.78 X 10-®® 

0.13 

4.13 X 10-°® 

-0.29 

5.36 X 10-°® 

27. Farmers markets per 100,000 in pop. [28] 

0.33 

2.88 X 10-®® 

-0.01 

9.59 X 10-°® 

0.42 

5.41 X 10-°® 

28. Binge drinking rate |24| 

0.33 

2.88 X 10-®® 

-0.12 

4.88 X 10-°® 

0.41 

6.23 X 10-°® 

29. Avg ^ poor physical health days, past 30 days |24| 

-0.32 

3.83 X 10-®® 

0.16 

3.32 X 10-°® 

-0.38 

1.16 X 10-°® 

30. Strolling of the Heifers locavore score (lower is better) [32| 

-0.31 

4.52 X 10-®® 

-0.16 

3.32 X 10-°® 

-0.45 

3.16 X 10-°® 

31. % schools offering fruit/veg at celebrations |28| 

0.25 

1.13 X 10-®® 

-0.38 

1.36 X 10-°® 

0.05 

7.75 X 10-°® 

32. Openness Big Five personality trait |31| 

0.23 

1.30 X 10-®® 

-0.42 

5.43 X 10-°® 

0.04 

7.95 X 10-°® 

33. % cropland harvested for fruits/veg |28| 

0.18 

2.58 X 10-®® 

-0.53 

2.90 X 10-°® 

-0.04 

7.95 X 10-°® 

34. Conscientiousness Big Five personality trait |31| 

-0.1 

5.31 X 10-®® 

0.14 

3.97 X 10-°® 

-0.05 

7.78 X 10-°® 

35. % census tracts, healthy food retailer within 1/2 mile [28| 

-0.06 

7.41 X 10-®® 

-0.39 

1.09 X 10-°® 

-0.24 

1.28 X 10-°® 

36. George Mason overall freedom ranking |33| (lower is freer) 

-0.02 

8.82 X 10-®® 

-0.05 

7.73 X 10-°® 

-0.1 

5.58 X 10-°® 

37. Agreeableness Big Five personality trait [31| 

0 

9.85 X 10-®® 

0.24 

1.26 X 10-°® 

0.08 

6.41 X 10-°® 


TABLE S3. Identical to Tab. |T] but including liquids and using a caloric difference rather than caloric ratio. Spearman 
correlation coefficients, ps, and Benjamini-Hochberg g-values for caloric input Cin, caloric output Cout, and caloric difference 
C'diff(a) = aCout + (1 — a)Cin and demographic data related to food and physical activity, Big Five personality traits |31| . 
health and well-being rankings by state, and socioeconomic status, correlated, ordered from strongest to weakest Spearman 
correlations with caloric ratio. We chose a so that the average of Cout matched the average of aCin. 














































